41 research outputs found
Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices
A task-based formulation of Scalable Universal Matrix Multiplication
Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is
applied to the multiplication of hierarchy-free, rank-structured matrices that
appear in the domain of quantum chemistry (QC). The novel features of our
formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and
(2) fine-grained task-based composition. These features make it tolerant of the
load imbalance due to the irregular matrix structure and eliminate all
artifactual sources of global synchronization.Scalability of iterative
computation of square-root inverse of block-rank-sparse QC matrices is
demonstrated; for full-rank (dense) matrices the performance of our SUMMA
formulation usually exceeds that of the state-of-the-art dense MM
implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text
overlap with arXiv:1504.0504
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis
Data movement is the dominating factor affecting performance and energy in
modern computing systems. Consequently, many algorithms have been developed to
minimize the number of I/O operations for common computing patterns. Matrix
multiplication is no exception, and lower bounds have been proven and
implemented both for shared and distributed memory systems. Reconfigurable
hardware platforms are a lucrative target for I/O minimizing algorithms, as
they offer full control of memory accesses to the programmer. While bounds
developed in the context of fixed architectures still apply to these platforms,
the spatially distributed nature of their computational and memory resources
requires a decentralized approach to optimize algorithms for maximum hardware
utilization. We present a model to optimize matrix multiplication for FPGA
platforms, simultaneously targeting maximum performance and minimum off-chip
data movement, within constraints set by the hardware. We map the model to a
concrete architecture using a high-level synthesis tool, maintaining a high
level of abstraction, allowing us to support arbitrary data types, and enables
maintainability and portability across FPGA devices. Kernels generated from our
architecture are shown to offer competitive performance in practice, scaling
with both compute and memory resources. We offer our design as an open source
project to encourage the open development of linear algebra and I/O minimizing
algorithms on reconfigurable hardware platforms
Equilibrium gas-phase structures of sodium fluoride, bromide, and iodide monomers and dimers
The alkali halides sodium fluoride, sodium bromide, and sodium iodide exist in the gas phase as both monomer and dimer species. A reanalysis of gas electron diffraction (GED) data collected earlier has been undertaken for each of these molecules using the EXPRESS method to yield experimental equilibrium structures. EXPRESS allows amplitudes of vibration to be estimated and correction terms to be applied to each pair of atoms in the refinement model. These quantities are calculated from the ab initio potential-energy surfaces corresponding to the vibrational modes of the monomer and dimer. Because they include many of the effects associated with large-amplitude modes of vibration and anharmonicity, we have been able to determine highly accurate experimental structures. These results are found to be in good agreement with those from high-level core-valence ab initio calculations and are substantially more precise than those obtained in previous structural studies
Parcelles de terre chersonésiennes au début du IIIe s. av.n.è.
Solomonik E. I., Nikolaenko G. M., Gaudey Jacqueline. Parcelles de terre chersonésiennes au début du IIIe s. av.n.è.. In: Esclavage et dépendance dans l'historiographie soviétique récente. Besançon : Université de Franche-Comté, 1995. pp. 185-210. (Annales littéraires de l'Université de Besançon, 577
Recommended from our members
A communication-optimal N-body algorithm for direct interactions
We consider the problem of communication avoidance in computing interactions between a set of particles in scenarios with and without a cutoff radius for interaction. Our strategy, which we show to be optimal in communication, divides the work in the iteration space rather than simply dividing the particles over processors, so more than one processor may be responsible for computing updates to a single particle. Similar to a force decomposition in molecular dynamics, this approach requires up to √p times more memory than a particle decomposition, but reduces communication costs by factors up to √p and is often faster in practice than a particle decomposition [1]. We examine a generalized force decomposition algorithm that tolerates the memory limited case, i.e. when memory can only hold c copies of the particles for c = 1, 2,...,√p. When c = 1, the algorithm degenerates into a particle decomposition, similarly when c = √p, the algorithm uses a force decomposition. We present a proof that the algorithm is communication-optimal and reduces critical path latency and bandwidth costs by factors of c2 and c, respectively. Performance results from experiments on up to 24K cores of Cray XE-6 and 32K cores of IBM Blue Gene/P machines indicate that the algorithm reduces communication in practice. In some cases, it even outperforms the original force decomposition approach because the right choice of c strikes a balance between the costs of collective and point-to-point communication. Finally, we extend the analysis to include a cutoff radius for direct evaluation of force interactions. We show that with a cutoff, communication optimality still holds. We sketch a generalized algorithm for multi-dimensional space and assess its performance for 1D and 2D simulations on the same systems. © 2013 IEEE
Parcelles de terre chersonésiennes au début du IIIe s. av.n.è.
Solomonik E. I., Nikolaenko G. M., Gaudey Jacqueline. Parcelles de terre chersonésiennes au début du IIIe s. av.n.è.. In: Esclavage et dépendance dans l'historiographie soviétique récente. Besançon : Université de Franche-Comté, 1995. pp. 185-210. (Annales littéraires de l'Université de Besançon, 577
High performance tensor-vector multiplication on shared-memory systems
International audienc